GLSL compiler: 
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In the last year 


* Committed long awaited geometry shader support (recently for Sandybridge 
too!) 


e Jumped from GLSL 1.40 to GLSL 3.30 

* Tons of new extensions 
- separate shader objects (4.1) gpu shader5 (4.0) 
- shader atomic counters (4.2) viewport array (4.1) 
- sample shading (4.0) explicit uniform location (4.3) 


- derivative control (4.5) 
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In the last year 


e Tons of easy algebraic optimizations 

- Amazing (and a bit disappointing) how many programs these help 
e “Vectorizing” multiple scalar operations 

- Amazing how bad code from DX translators can be 
e Finally implemented common subexpression elimination (kind of...) 

- Only works on constants and uniforms 


e Realizing more and more that a tree-based IR makes things difficult 
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In the last year... in the 1965 backend 


e New SEL instruction peephole, dead control flow elimination 

e Significant improvements to register allocation and instruction scheduling 
e Rewritten vec4 and scalar dead code elimination passes 

e Lots of register coalescing improvements 

e New vec4 CSE pass 

e Preserving the control flow graph across all optimization passes 


e Realizing more and more that we want an SSA-based IR 
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How do we measure compiler improvements? 


* Benchmarking games is often tedious and has a lot of variability 
e apitraces don't work for benchmarking for a number of reasons 
e Optimizations often individually too small to detect FPS changes 


e Would like to measure improvements in generated code more directly 
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shader-db 


e Collection of shaders gathered from games and benchmarks 


- Plus scripts to compile them and collect statistics 
e 19599 * shader test files in my local checkout (GLSL and ARB vp/fp) 


e Quick and easy to check whether an optimization helps or hurts real 
applications 
glsl: Optimize open-coded lrp into lrp. 
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| total instructions in shared programs: 1498191 -> 1487051 (-0.74%) 
| Instructions in affected programs: 669388 -> 658248 (-1.66%) 

| 
| 
| 
| 
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GAINED: 1 
LOST: 0 
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GLSL code from DX translators 
e What we get: 


UU O DN ASA A A ae | 
| rl.w = inversesqrt( r7.x ); | | vec4 cmp(in vec4 src0, in vec4 srcl, in vec4 src2) | 
|! r2.W = inversesqrt( r7.y ); | | | 
| r0.w = inversesqrt( r7.z ); | | vec4 result; | 
o r7,x = 1.0 / rl.w; | ı result.x = src0.x >= 0.0 ? srcl.x : src2.x; | 
| rl.w = inversesqrt( r7.w ); | | result.y = src0.y >= 0.0 ? srcl.y : src2.y; | 
i r7.y = 1.0 / r2.w; | , result.z = src0.z >= 0.0 ? srcl.z : src2.z; | 
r7.w= 1.0 / rl W | | result.w = src0.w >= 0.0 ? srcl.w : src2.w; | 
_r7.z = 1.0 / r0.w; | | return result; | 
NN e | 
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vec4 cmp(in vec4 src0, in vec4 srcl, in vec4 src2) 


return mix(src2, srcl, greaterThanEqual(srcO0, 0.0)); 
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A year's worth of compiler improvements 


total instructions in shared programs: 5777098 -> 4823707 (-16.50%) 


| 
| 
instructions in affected programs: 5558170 -> 4604779 (-17.15%) | 
GAINED: 1717 | 
LOST: 14 | 
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e SIMD16 programs increased from 88.6% (16401/18497) to 97.8% 
e 43559 programs helped, 9512 unchanged, 110 hurt 

* Cut number of loops in programs by ~10% 

e Cut number of basic blocks by 16.49% 


e Cut number of CFG calculations by 92% 
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Questions (so far) 
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The fires are (mostly?) out. What to do now? 


e Have been reactionary for a long time 
e New Steam games usually just work these days 


- And if not, usually only small fixes required 


e Can afford to think about longer term investments 
e Lack of compiler infrastructure has hurt us in the past 


- |965's fs dead code elimination pass without a CFG 
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What do we actually want? (1965 backend) 


e SSA 
- Existing optimization passes become more efficient and more effective 
- Allows for new optimizations like GCM-GVN and divergence analysis 

* An SSA-based register allocator 
- Can register allocate in polynomial time! (Maybe!) 


- Can make better decisions about register usage 
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What do we actually want? (glsl compiler) 


e A flat (non-tree-based) SSA IR 
- Wouldn't it be nice to do GCM-GVN in a place common to all drivers? 
e To translate both to and from TGSI 
- For drivers that don't want to write all of the same optimizations again 


e Something other people (1.e., non-Intel) will also work on 
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Questions after 
Connor's talk 
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